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Abstract 

Lam, Gusficld, and Sridhar (2009) showed that a set of three-state characters has 
a perfect phylogeny if and only if every subset of three characters has a perfect phy- 
logeny. They also gave a complete characterization of the sets of three three-state 
characters that do not have a perfect phylogeny. However, it is not clear from their 
characterization how to find a subset of three characters that does not have a perfect 
phylogeny without testing all triples of characters. In this note, we build upon their 
result by giving a simple characterization of when a set of three-state characters does 
not have a perfect phylogeny that can be inferred from testing all pairs of characters. 

1 Introduction 

The k-state perfect phylogeny problem is one of the classic decision problems in computa- 
tional biology. The input is an n by m matrix M of integers from the set {1, . . . , k}. We 
call a row of M a taxon (plural taxa), a column of M a character, and a value in column c 
of M a state of character c. A perfect phylogeny for M is an undirected tree t with n leaves 
each labeled by a distinct taxon of M in such a way that, for each character c and each 
pair i,j of states of c, the minimal subtree of t containing all the leaves labeled by a taxon 
with state i for character c is node-disjoint from the minimal subtree of t containing all 
the leaves labeled by a taxon with state j for character c. The fc-state perfect phylogeny 
problem is to decide whether M has a perfect phylogeny. If M has a perfect phylogeny, we 
say that the characters in M are compatible, otherwise they are incompatible. See [6, 17] 
for more on the perfect phylogeny problem. See Figure 1 for an example. 

If the number of states of each character is unbounded (so k can grow with n), then 
the perfect phylogeny problem is NP-complete [2, 18]. However, if the number of states 
of each character is fixed, the perfect phylogeny problem is solvable in 0(m 2 n) (in fact, 
linear time for k = 2) [9, 4, 13, 1, 14]. Each of these algorithms can also construct a 
perfect phylogeny for M if one exists. However, since every subset of a compatible set of 
characters is itself compatible, if no perfect phylogeny exists for M, there must be some 
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Figure 1: Example 3-state perfect phylogeny for input matrix M. 



minimal subset of the characters of M that does not have a perfect phylogeny. We call such 
a set a minimal obstruction set for M. None of the above mentioned algorithms output a 
minimal obstruction set when there is no perfect phylogeny for M. 

If the characters in M are two-state characters, then M has a perfect phylogeny if and 
only if the characters in M are pairwise compatible. Hence, a minimal obstruction set for 
k = 2 is of cardinality two [3, 16, 18, 5]. A recent breakthrough by Lam, Gusfield, and 
Sridhar [15] shows that if the characters in M are three-state characters, then any minimal 
obstruction set for M has cardinality at most three. It is conjectured that given an input 
matrix M of /c-state characters, there exists a function f(k) such that M has a perfect 
phylogeny if and only if every subset of f(k) characters of M has a perfect phylogeny 
[7, 12, 8, 16, 9, 15, 11]. From the discussion above, it follows that /(2) = 2 and /(3) = 3. 
Recent work of Habib and To [11] shows that /(4) > 5. 

If the characters in M are fc-state characters and the cardinality of a minimal obstruc- 
tion set for M is bounded above by f(k), then it is preferable to have a test for the existence 
of such an obstruction set that does not require testing all subsets of f(k) characters in M, 
and ideally one that can be inferred from testing all pairs of the characters in M. Since we 
can decide if M has a perfect phylogeny in 0(m 2 n) time, and construct a perfect phylogeny 
in such a case, we should hope to also output a minimal obstruction set in 0(m 2 n) time 
when a perfect phylogeny for M does not exist. 

Here, we will focus on the three-state perfect phylogeny problem. Hence, we restrict 
M to be an n by m matrix of integers from the set {1, 2, 3}. We build upon the work of 
Lam, Gusfield, and Sridhar [15] who showed that if M does not have a perfect phylogeny, 
then M has an obstruction set of cardinality at most three. They also gave a complete 
characterization of the minimal obstruction sets of cardinality three. However, it is not 
clear from their characterization how to find such an obstruction set without independently 
testing all triples of characters in M, requiring 0(m 3 n) time. In this note, we remedy this 
situation by giving a simple characterization of when a set of three-state characters does 
not have a perfect phylogeny that can be inferred from testing all pairs of characters in M. 
This leads to a 0(m 2 n) time algorithm to find an obstruction set when M does not have a 
perfect phylogeny. If M does admit a perfect phylogeny, then any of the above mentioned 
algorithms can be used to construct a perfect phylogeny for M in 0{m 2 n) time. 
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2 Preliminaries 



2.1 Perfect Phylogenies and Partition Intersection Graphs 

The partition intersection graph of M, denoted pig(M), is the graph that has a vertex a 
for each character c and each state i of c, and an edge between two vertices q and dj 
precisely if there is a taxon that has both state i for character c and state j for character 
d. Note that there can be no edges between two vertices of the same character. See Figure 
2 for an example. In this section we give a brief overview of some known results relating 
three-state perfect phylogenies to partition intersection graphs. 
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Figure 2: Partition intersection graph of the matrix M from Figure 1. 

A graph G is triangulated if and only if there are no induced chordless cycles of length 
four or greater. A proper triangulation of pig(M) is a triangulated supergraph of pig(M) 
such that each edge is between vertices of different characters. 

Theorem 1 (Buneman [3], Meacham [16], Steel [18]). There is a perfect phylogeny for M 
if and only if pig(M) has a proper triangulation. 

For a subset C = {c±, . . . , Cj} of the characters in M, we write M[c\, . . . , Cj] to denote 
M restricted to the columns in C. We say that M is pairwise compatible if, for every pair 
a, b of characters in M, there is a perfect phylogeny for M[a, b\. 

Theorem 2 (Estabrook and McMorris [5]). Let a and b be two characters of M . Then 
M[a,b] has a perfect phylogeny if and only if pig(M[a, b]) is acyclic. 

Theorem 3 (Lam, Gusfield, and Sridhar [15]). M has a perfect phylogeny if and only if, 
for every three characters a, b, c in M , M[a, b, c] has a perfect phylogeny. 

If three of the characters are incompatible, then either they are not pairwise compatible, 
or, as the following theorem shows, the edges of their partition intersection graph is a 
superset (up to renaming of states) of one of a collection of "forbidden" edge sets. 
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Theorem 4 (Lam, Gusfield, and Sridhar [15]). Let M be pairwise compatible. Then, a 
triple {a, b, c} of characters in M is a minimal obstruction set if and only if (under possibly 
renaming states) pig(M[a, b, c]) contains all of the edges of one of graphs of Figure 3. 
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Figure 3: The forbidden sets of edges of the partition intersection graph of three characters 
that have a perfect phylogeny (adapted from Figure 42 in [15]). We note that in [15], there 
are four forbidden sets of edges, however, one of the sets of edges is a superset of one of 
the other sets of edges. Thus, only three are needed here. 

2.2 Solving Three-State Perfect Phylogeny with Two-State Characters 

Here we review a result of Dress and Steel [4]. Our exposition closely follows that of [10]. 

Our goal is to derive a matrix of two-state characters M from the matrix M of three- 
state characters. The properties of M are such that they enable use to find a perfect 
phylogeny for M. The matrix M contains three characters c(l), c(2), c(3) for each character 
c in M, such that all of the taxa that have state i for c in M are given state 1 for character 
c(i) in M, and the other taxa are given state 2 for c(i) in M. 

Since every character in M has two states, two characters c(i) and d(j) of M are 
incompatible if and only if the two columns corresponding to c(i) and d(j) contain all four 
of the pairs (1,1), (1,2), (2,1), and (2,2), otherwise they are compatible. This is known 
as the four gametes test [17]. 

Theorem 5 (Dress and Steel [4] ) . There is a perfect phylogeny for M if and only if there 
is a subset C of the characters of M such that 

(i) the characters in C are pairwise compatible, and 

(ii) for each character c in M, C contains at least two of the characters c(l), c(2), c(3). 

Theorem 5 was used in [4] to give an 0(m 2 n) time algorithm to decide if there is a 
perfect phylogeny for M. It was also used in [10] to reduce the three-state perfect phylogeny 
problem in polynomial time to the well known 2-SAT problem, which is in P. 
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3 A Simple Characterization of Minimal Obstruction Sets 



In this section, we focus on the case where M is pairwise compatible. Our main result is a 
characterization of the situation where M does not have a perfect phylogeny that is based 
on the partition intersection graphs for the pairs of characters in M. Theorem 2 gives a 
simple characterization of the situation when M is not pairwise compatible. 

We say that a state i for a character c of M is dependent precisely when there exists 
a character d of M, and two states j, k of d, such that c(i) is incompatible with both d(j) 
and d(k). The character d is a witness that state i of c is dependent. 

Lemma 6. Let c be a character of M and let i be a dependent state of C . Then no pairwise 
compatible subset of characters in M satisfying Theorem 5 contains c(i). 

Proof. Let I be a pairwise compatible subset of the characters in M that contains c(i). 
Since state % of c is dependent, there is a character b in M and two states j, k of b, such 
that c(i) is incompatible with both b(j) and b(k). It follows that b(j) I and b(k) I. 
But then I cannot possibly contain two of 6(1), b(2), and b(3). Thus, / cannot satisfy the 
condition required in Theorem 5. □ 

The next lemma gives a characterization of when a state is dependent using partition 
intersection graphs. We first introduce some notation: if p : P1P2P3P4P5 is a path of length 
four in a graph, then we write middle [p] to denote ps, the middle vertex of p. 

Lemma 7. Let M be pairwise compatible. A state i of a character c of M is a dependent 
state if and only if there is a character d of M and a path p of length four in pig(M[c, d]) 
with middle[p] = Cj. 



Proof. W.l.o.g. assume that i = 1, i.e., Cj = c±. 

(=>) Since 1 is a dependent state of c, there exists a character d in M such that c(l) is 
incompatible with two of d(l), d(2), and d(3). W.l.o.g., assume c(l) is incompatible with 
both d{l) and d(2). Then, c\d\ and c\d2 are edges of pig(M[c, d]), and, since M has no 
cycles, either 6^2 and dies or d^c^, and d\C2 are edges of G. If c?2C2 and dic% are edges of 
pig(M[c, d]), then cid-2,c\d\Cz is the required path of length four. If ^203 and d\C2 are edges 
of pig(M[c, d]), then c^d2C\d\C2 is the required path of length four. 
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(a) c(l) and are incompatible. (b) c(l) and d(2) are incompatible. 
Figure 4: Illustrating the proof of Lemma 7. 
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(<=) Let d be a character of M such that there is a path p of length four in pig(M[c, d]) 
with middle[p] = c\. Since pig(M[c, d]) cannot contain edges between to states of the same 
character, we can assume wd.o.g. that p is the path C2d\C\d2C^. Then, it is easy to verify 
that c(l) is incompatible with both d(l) and d(2). This is illustrated in Figure 4. □ 

Lemma 8. // M is pairwise compatible and there is a character c of M that has two 
dependent states, then no perfect phylogeny exists for M . 

Proof. Let i and j be two dependent states of c. Then, by Lemma 6, no pairwise compatible 
subset / of the characters of M that satisfy the condition required in Theorem 5 can contain 
c(i) or c(j). But then / can only contain one of c(l), c(2), or c(3). Hence, no pairwise 
compatible subset I of the characters of M can satisfy the condition required in Theorem 
5. Hence, by Theorem 5, there is no perfect phylogeny for M. □ 

We now show that the converse of Lemma 8 holds. 

Lemma 9. If M is pairwise compatible and has no perfect phylogeny, then there exists a 
character c of M that has two dependent states. 

Proof. By Theorem 4, there exists characters a,b,c in M such that G = pig(M[a, b, c]) 
(under possibly renaming of states) contains all of the edges of at least one of the subgraphs 
of Figure 3. If G contains all of the edges of Figure 3a, then C3&1C1&2C2 is a path witnessing 
that ci is dependent and c^a^a^ci is a path witnessing that C2 is dependent (this is 
illustrated in Figure 5a). If G contains all of the edges of Figure 3b, then C3&1C1&2C2 
is a path witnessing that c\ is dependent and cj,a\C2d2C\ is a path witnessing that C2 is 
dependent (this is illustrated in Figure 5b). If G contains all of the edges of Figure 3b, then 
0302010102 is a path witnessing that ci is dependent and C363C261C1 is a path witnessing 
that C2 is dependent (this is illustrated in Figure 5c). In all three cases, M contains a 
character that has two dependent states. □ 



a\ a\ c i 




Figure 5: Illustrating the proof of Lemma 9. 
Lemmas 8 and 9 together immediately imply our main theorem. 
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Theorem 10. If M is pairwise compatible, then there is a perfect phylogeny for M if and 
only if there is at most one dependent state of each character c of M. 

Observation 11. Let M be pairwise compatible and let c be a character of M with two 
dependent states. Let a be a witness for one dependent state of c and let b be a witness for 
another dependent state of c. Then, the set {a,6, c} is an obstruction set for M. 

This leads to the following 0(m 2 n) time algorithm to find a minimal obstruction set 
for M, if one exists. 



Algorithm 1 MinimalObstructionSet(M) 

Input: M is an n by m matrix of integers from the set {1, 2, 3}. 

Output: A minimal obstruction set for M if one exists, otherwise the empty set. 

1 for all characters x in M do 

2 for all states i of x do 

3 markfxj] <— 0; 

4 5^0; 

5 for all pairs of characters a, b in M do 

6 G^pig(M[a,6]); 

7 if G contains a cycle then 

8 return {a, 6}; 

9 else if S = then 

10 for all Xi £ {01,02,03,61,62,63} such that markfxj] is empty do 

11 if there is a path p of length four in G with middle [p] = Cj then 

12 markfxj] <— {a, 6} \ {x}; 

13 if two states i,j of a have non-empty marks then 

14 S {a} U mark[<2j] U mark[oj]; 

15 else if two states i,j of 6 have non-empty marks then 

16 S <- {6} U mark[6j] U mark[6,]; 

17 return S; 



The correctness of the algorithm follows from Theorem 2, Theorem 10, Observation 
11, and Lemma 7. To see that the algorithm takes 0(m 2 n) time note that the runtime 
is dominated by the loop of lines 5-16 which executes once for each of the 0(m 2 ) pairs of 
characters in M. Constructing the partition intersection graph of two three-state characters 
takes 0(n) time. Since the partition intersection graph of two three-state characters is of 
constant size, each of the other operations performed in the loop take constant time. 

We note that if no obstruction set exists for M, then a perfect phylogeny for M can 
be constructed in 0(m 2 n) time by using one of the existing algorithms for the three-state 
perfect phylogeny problem [4, 13, 1, 14, 10]. 
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